Introduction
Introduction#
Quantitative MRI (qMRI) has a reproducibility problem (Keenan et al. 2019). Despite the promise that qMRI improves specificity and reproducibility of measurements over clinical MRI scans, few qMRI techniques have entered the clinic. Even the most fundamental MR parameters cannot be measured with sufficient reproducibility and precision across clinical scanners to pass the second of six stages of technical assessment for clinical biomarkers (Fryback and Thornbury 1991; Schweitzer 2016; Seiberlich et al. 2020). Nearly half a century has passed since the first quantitative MRI maps (spin-lattice relaxation time, T1) were reported (Pykett and Mansfield 1978), yet there is still disagreement in reported values for his fundamental parameter (T1) across different sites, vendors, and implementations (Stikov et al. 2015).
Amongst fundamental MRI parameters, T1 holds significant importance. It represents the time it takes for longitudinal magnetization to recover after being disturbed by an RF pulse. The T1 value varies based on molecular mobility and magnetic field strength (Bottomley et al. 1984; Wansapura et al. 1999; Dieringer et al. 2014), making it a valuable parameter for distinguishing between tissue types. Accurate knowledge of T1 values is essential for optimizing clinical MRI pulse sequences for contrast and time efficiency (Ernst and Anderson 1966; Redpath and Smith 1994; Tofts 1997) and as a calibration parameter for other quantitative MRI techniques (Sled and Pike 2001; Yuan et al. 2012). Amongst the number of techniques to measure T1, inversion recovery (IR) (Drain 1949; Hahn 1949) is widely held as being the gold standard T1 mapping technique, as it is very robust against other effects (e.g. B1 inhomogeneity) or potential errors in measurements (e.g. insufficient spoiling) (Stikov et al. 2015). However, because the technique requires a long repetition time (TR > T1), it is very slow and impractical for whole-organ measurements, limiting its clinical use. In practice, it is mostly used as a reference to validate other T1 mapping techniques, such as variable flip angle imaging (VFA) (Fram et al. 1987; Deoni, Rutt, and Peters 2003; Cheng and Wright 2006) and MP2RAGE (Marques et al. 2010).
Efforts have been made to develop quantitative MRI phantoms to assist in standardizing T1 mapping methods (Keenan et al. 2018). A quantitative MRI standard system phantom was created in a joint project between ISMRM and the National Institute of Standards and Technology (NIST) (Stupic et al. 2021), and has since been commercialized (Premium System Phantom, CaliberMRI, Boulder, Colorado). The spherical phantom has a 57-element fiducial array containing spheres with doped liquids that model a wide range of T1, T2, and PD values. The reference values of each sphere were measured using NMR at 1.5T and 3.0T. The standardized concentration for relaxometry values established as references by NIST are also used by another company for their quantitative relaxometry MRI phantoms (Gold Standard Phantoms Ltd., Rochester, England). The cardiac TIMES phantom (Captur et al. 2016) is another commercially available system phantom used for T1, focusing on T1 and T2 values in blood and heart muscles, pre- and post-contrast. The ISMRM/NIST phantom has been used in a few large multicenter studies already, such as . (Bane et al. 2018) where they compared measurements at eight sites on a single phantom using the inversion recovery and VFA T1 mapping protocols recommended by NIST for their phantom, as well as some site-specific imaging protocols used for DCE. In another study led by NIST researchers (Keenan et al. 2021), T1 measurements were done at two clinical field strengths (1.5T and 3.0T) and 27 MRI systems (three vendors) using the recommended NIST protocols. That study found no significant relationship between T1 discrepancies of the measurements and the MRI vendors used.
The 2020 ISMRM reproducibility challenge posed the following question: will an imaging protocol independently-implemented at multiple centers reliably measure what is considered one of the fundamental MR parameters (T1) using the most robust technique (inversion recovery) in a standardized phantom (ISMRM/NIST system phantom). The challenge aimed at assessing the variability in measurements due to different groups reproducing a protocol from a specific publication (Barral et al. 2010). As the focus of this challenge was on reproducibility, the challenge design emphasized the use of reproducible research practices, such as sharing code, pipelines, data, and scripts to reproduce figures. To be more inclusive and broaden participation, participants were also invited to data acquired on healthy subjects if they did not have access to the necessary ISMRM/NIST system phantom, provided that their local and institutional ethics protocols permitted it.
Materials#
The challenge was launched for those with access to the International Society of Magnetic Resonance in Medicine/National Institute of Standards and Technology (ISMRM/NIST) system phantom (Stupic et al. 2021) (Premium System Phantom, CaliberMRI, Boulder, Colorado). Two versions of the phantom have been produced with slightly different quantitative parameters values in the liquid spheres. Phantoms with serial numbers 0041 or less are referred to as “Version 1”, and those 0042 or greater are “Version 2”. The phantom has three plates containing sets of 14 spheres for ranges of proton density (PD), T1 (NiCl2), and T2 (MnCl2) values. Reference T1 values at 20 °C and 3.0 T for the T1 plate are listed in Table 1 for both versions of the phantom. Participants were instructed to record the temperature before and after scanning the phantom using the phantom's internal thermometer. Instructions for positioning and setting up the phantom were provided to participants through the NIST website.
Table 1. Reference T1 values of the “T1 plate” of the standard phantom (for both phantom versions) measured at 20 °C and 3.0 T. Phantoms with serial numbers 0041 or less are referred to as “Version 1”, and those 0042 or greater are “Version 2”.Sphere # |
Version 1 (ms) |
Version 2 (ms) |
|---|---|---|
1 |
1989 ± 1.0 |
1883.97 ± 30.32 |
2 |
1454 ± 2.5 |
1330.16 ± 20.41 |
3 |
984.1 ± 0.33 |
987.27 ± 14.22 |
4 |
706 ± 1.0 |
690.08 ± 10.12 |
5 |
496.7 ± 0.41 |
484.97 ± 7.06 |
6 |
351.5 ± 0.91 |
341.58 ± 4.97 |
7 |
247.13 ± 0.086 |
240.86 ± 3.51 |
8 |
175.3 ± 0.11 |
174.95 ± 2.48 |
9 |
125.9 ± 0.33 |
121.08 ± 1.75 |
10 |
89.0 ± 0.17 |
85.75 ± 1.24 |
11 |
62.7 ± 0.13 |
60.21 ± 0.87 |
12 |
44.53 ± 0.090 |
42.89 ± 0.44 |
13 |
30.84 ± 0.016 |
30.40 ± 0.62 |
14 |
21.719 ± 0.005 |
21.44 ± 0.31 |
Participants without access to the ISMRM/NIST phantom were encouraged to collect healthy human brain T1 maps following their institutional ethical guidelines and with participants' consent to participate in the challenge. To ensure consistency across datasets, single-slice positioning parallel to the AC-PC line was recommended. All submitted datasets and subsequent fitted T1 maps were to be uploaded to the data sharing website OSF.io, and thus participants were informed obtain consent for open-data sharing before scanning and to anonymize their data before submission. As the submitted single-slice inversion recovery images would be along the AC-PC line, they are unlikely to contain sufficient information facial identification, and therefore de-masking was not recommended. Participants who submitted human data for this challenge provided written confirmation to the organizers that their data for this challenge was in accordance with their institutional ethics committee (or equivalent regulatory body) and that the subjects had consented to sharing their data as described in the challenge.
Protocol#
Participants were instructed to acquire data for T1 mapping data using the spin-echo inversion recovery protocol for T1 mapping as reported in (Barral et al. 2010), as detailed in Table 2. This protocol uses four inversion times optimized for human brain T1 values and uses a relatively short TR (2550 ms). It’s important to note that this acquisition protocol is not suitable for T1 mapping fitting models that assume TR > 5T1. Instead, more general models of inversion recovery, such as the Barral et al. fitting model described in Section 2.4.1, can be used to fit this data.
Table 2. Imaging protocol for inversion recovery T1 mapping proposed to the participants for the 2020 joint RRSG-qMRSG reproducibility challenge. The protocol is the brain imaging protocol used in (Barral et al. 2010), and which is meant for the T1 values observed in healthy human brains.Pulse Sequence |
Spin-echo inversion recovery |
|---|---|
Repetition Time (TR) |
2550 ms |
Inversion Time (TI) |
50, 400, 1100, 2500 ms |
Echo Time (TE) |
14 ms |
In-plane resolution |
1x1 mm2 |
Slice thickness |
2 mm |
Participants were advised to adhere to this protocol as closely as possible, but to report any differences in protocol parameters due to technical limitations of their scanners and/or software. The recommended data exportation type was complex (magnitude & phase, or real & imaginary), and magnitude-only data was also acceptable if complex data could not be exported.
Data Submissions#
Data submissions for the challenge were managed through a dedicated repository on GitHub, accessible at https://github.com/rrsg2020/data_submission. This allowed transparent and open review of the submissions, as well as better standardization of the process. All datasets had to be converted to the NIfTI file format, and images from different TIs needed to be concatenated into the fourth (or “time”) dimension. Magnitude-only datasets required one NIfTI file, while complex datasets required two files (magnitude and phase, or real and imaginary). Additionally, a configuration file containing submission, dataset, and acquisition details (such as data type, submitter name and email, site details, phantom or volunteer details, and imaging protocol details) was required for each submitted dataset to ensure that the information was standardized and easily found. Each submission was reviewed to confirm that guidelines were followed, and then datasets and configuration files were uploaded to OSF.io. A Jupyter Notebook (Kluyver et al. 2016; Beg et al. 2021) pipeline was used to generate T1 maps at this stage also for quality assurance. Links to the Jupyter Notebook for reproducing the T1 map were shared for each submission using the MyBinder platform, ensuring that computation environments were reproducible without the need for installation of software packages on peoples local computers.
Fitting Model and Pipeline#
A reduced-dimension non-linear least squares (RD-NLS) approach was used to fit the complex general inversion recovery signal equation:
where a and b are complex constants. This approach, introduced in (Barral et al. 2010), models the general T1 signal equation without approximating for a very long TR. The a and b constants inherently factor TR in them. Barral et al. shared the implementation of their fitting algorithm used in their paper. To facilitate its use in our pipelines, we used a wrapper around this code available in the open-source software qMRLab (Cabana et al. 2015; Karakuzu et al. 2020), which provides a standardized API to call the fitting in MATLAB/Octave scripts.
A Jupyter Notebook data processing pipeline was written using MATLAB/Octave. This pipeline automatically downloads all the datasets, loads each dataset configuration file, fits the T1 data voxel-wise, and exports the resulting T1 map to the NIfTI and PNG formats for quality assurance. This pipeline is available in a GitHub repository (https://github.com/rrsg2020/t1_fitting_pipeline, filename: RRSG_T1_fitting.ipynb). Once all submissions were collected and the pipeline was executed, the T1 maps were uploaded to OSF.io.
Image Labeling & Registration#
The T1 plate of the phantom had 14 spheres that were labeled as the regions-of-interest (ROI) using a numerical mask template created in MATLAB, provided by NIST researchers (Figure 1–a). To avoid potential edge effects in the T1 maps, the ROI labels were reduced to 60% of the expected sphere diameter. A registration pipeline in Python using the Advanced Normalization Tools (ANTs) (Avants, Tustison, and Song 2009) was developed and shared in the “analysis” repository of our GitHub organization (https://github.com/rrsg2020/analysis, filename: register_t1maps_nist.py). The ROI labels template was nonlinearly registered to each submitted dataset’s T1 map uploaded to OSF.io.
Figure 1. ROI selection for the NIST phantom (a) and the human brain (b). a) The 14 ROIs (shades of blue/green) were automatically generated using a script provided by NIST. In yellow are the three reference pins in the phantom, and are not ROIs or spheres. b) ROIs were manually segmented in the human brains in four regions: the genu (yellow, 5x5 voxels), splenium (green, 5x5 voxels), deep gray matter (blue, 5x5 voxels), and cortical gray matter (red, three sets of 3x3 voxels). Note: due to differences in slice positioning from the single-slice datasets provided by certain sites, for some datasets it was not possible to manually segment an ROI in the genu or deep gray matter. In the case of the missing genu, left or right frontal white matter was selected; for deep grey matter, it was omitted entirely for those cases.
Manual ROIs were manually segmented using FSLeyes (McCarthy 2019) in four regions for human datasets (Figure 1-b): genu, splenium, deep gray matter, and cortical gray matter. Automatic segmentation was not used because the data was single-slice and there was inconsistent slice positioning between datasets.
Analysis and Statistics#
Analysis code and scripts were developed and shared in a version-tracked public GitHub repository. Python-based Jupyter Notebooks were used for both the quality assurance and main analysis workflows. The computational environment requirements were containerized in Docker Docker (Merkel 2014; Boettiger 2015), allowing for an executable environment that can reproduce the analysis in a web browser through MyBinder (Project Jupyter et al. 2018). Backend Python files handled reference data, database handling, ROI masking, and general analysis tools, while configuration files managed the dataset information which were downloaded and pooled using a script (make_pooled_datasets.py). The databases were created using a reproducible Jupyter Notebook script and subsequently saved in the repository.
For the NIST phantom data, mean T1 values for each ROI were compared with temperature-corrected reference values and visualized in three different types of plots (linear axes, log-log axes, and error relative to the reference value). This comparison was repeated for individual measurements at each site and for all measurements grouped together. Temperature correction was carried out via interpolation of the set of reference NIST T1 values between 16 °C and 26 °C (2 °C intervals), listed in the phantom technical specifications. For the human datasets, a notebook was created to plot the mean and standard deviations for each tissue ROI from all submissions from all sites. All the quality assurance and analysis plot images were saved to the repository for ease-of-access and a timestamped version-controlled record of the state of the analysis figures. The database files of ROI values and acquisition details for all submissions were also saved to the repository.
An interactive dashboard was developed in Dash by Plotly (Plotly Technologies Inc. 2015) and hosted by NeuroLibre (Karakuzu et al. 2022) to provide an interactive approach for exploring the data, analysis, and statistics of the challenge results. The dashboard visualizes the mean, median, standard deviation, and coefficient of variations for each phantom sphere and brain ROI. The data was collected from the pre-prepared databases of masked ROI values and incorporated other database information, such as phantom version, temperature, MRI manufacturer, and reference values. The interactive dashboard displays these results for all measurements at all sites.
Submissions#
Nineteen participants submitted data that were approved, which included 41 T1 maps of the NIST/system phantom, and 56 brain T1 maps. It should be noted that these numbers include a subset of measurements where both complex and magnitude-only data from the same acquisition were used to fit T1 maps, thus the total number of unique acquisitions is lower than the numbers reported above. The datasets were collected on three MRI manufacturers (Siemens, GE, Philips) and were acquired at 3.0 T, except for one dataset acquired at 350 mT. To showcase the heterogeneity of the independently-implemented submissions, Figure 2 displays six T1 maps of the phantoms submitted to the challenge.
Of these datasets, several submissions went beyond the minimum acquisition and acquired additional datasets using the NIST phantom, such as a traveling phantom (7 scanners), scan-rescan , same-day rescans on two MRIs, short TR vs long TR, and 4 point TI vs 14 point TI. For humans, one site acquired 13 subjects on three scanners (two manufacturers), one site acquired 6 subjects , and one site acquired a subject using two different head coils (20 channels vs. 64 channels).
Figure 2. Example T1 maps that were submitted. Note the differences in acquisitions (e.g. FOV (top middle), orientation (bottom right, k-space pattern (top left and right) and resulting artifacts in the T1 maps (e.g. ghosting (bottom left), ringing (bottom middle), noise profiles (top left and bottom right), deformation/slice mispositioning (top right)) resulting from the independently-implemented acquisition protocols.from os import path
import os
if path.isdir('analysis')== False:
!git clone https://github.com/rrsg2020/analysis.git
dir_name = 'analysis'
analysis = os.listdir(dir_name)
for item in analysis:
if item.endswith(".ipynb"):
os.remove(os.path.join(dir_name, item))
if item.endswith(".md"):
os.remove(os.path.join(dir_name, item))
# Imports
from pathlib import Path
import pandas as pd
import json
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import Video
import glob
from analysis.src.plots import *
from analysis.make_pooled_datasets import *
# Configurations
configFile = Path('analysis/configs/3T_NIST_T1maps.json')
data_folder_name = 'analysis/3T_NIST_T1maps'
output_gif_folder = Path("analysis/plots/01-wholedataset_gif_NIST/")
output_gif_name = 'NIST.gif'
# Download datasets
if not Path(data_folder_name).exists():
make_pooled_dataset(configFile, data_folder_name)
with open(configFile) as json_file:
configJson = json.load(json_file)
def get_image(dataset_name, key2):
# Load T1 image data
t1_file = configJson[dataset_name]['datasets'][key2]['imagePath']
t1 = nib.load(Path(data_folder_name) / t1_file)
t1_volume = t1.get_fdata()
# Handle 2D vs 3D volume case
dims = t1_volume.shape
if (len(dims) == 2) or (np.min(dims) == 1):
im = np.rot90(t1_volume)
else:
index_smallest_dim = np.argmin(dims)
numberOfSlices = dims[index_smallest_dim]
midSlice = int(np.round(numberOfSlices/2))
if index_smallest_dim == 0:
im = np.rot90(np.squeeze(t1_volume[midSlice,:,:]))
elif index_smallest_dim == 1:
im = np.rot90(np.squeeze(t1_volume[:,midSlice,:]))
elif index_smallest_dim == 2:
im = np.rot90(np.squeeze(t1_volume[:,:,midSlice]))
xAxis = np.linspace(0,im.shape[0]-1, num=im.shape[0])
yAxis = np.linspace(0,im.shape[1]-1, num=im.shape[1])
return im, xAxis, yAxis
im_1, xAxis_1, yAxis_1 = get_image('wang_MDAnderson_NIST', 'day2_mag')
im_2, xAxis_2, yAxis_2 = get_image('CStehningPhilipsClinicalScienceGermany_NIST', 'Bonn_MR1_magnitude')
im_3, xAxis_3, yAxis_3 = get_image('mrel_usc_NIST', 'Session1_MR1')
im_4, xAxis_4, yAxis_4 = get_image('karakuzu_polymtl_NIST', 'mni')
im_5, xAxis_5, yAxis_5 = get_image('madelinecarr_lha_NIST', 'one')
im_6, xAxis_6, yAxis_6 = get_image('matthewgrechsollars_ICL_NIST', 'magnitude')
im_6 = np.flipud(im_6)
# PYTHON CODE
# Module imports
import matplotlib.pyplot as plt
from PIL import Image
from matplotlib.image import imread
import scipy.io
import plotly.graph_objs as go
import numpy as np
from plotly import __version__
from plotly.offline import init_notebook_mode, iplot, plot
config={'showLink': False, 'displayModeBar': False}
init_notebook_mode(connected=True)
from IPython.display import display, HTML
import os
import markdown
import random
from scipy.integrate import quad
import warnings
warnings.filterwarnings('ignore')
trace1 = go.Heatmap(x = xAxis_1,
y = yAxis_1,
z=im_1,
zmin=0,
zmax=3000,
colorscale='viridis',
xaxis='x2',
yaxis='y2',
visible=True,
name = 'id: 10.003')
trace2 = go.Heatmap(x = xAxis_2,
y = yAxis_2,
z=im_2,
zmin=0,
zmax=3000,
colorscale='viridis',
xaxis='x2',
yaxis='y2',
visible=False,
name = 'id: 6.005')
trace3 = go.Heatmap(x = xAxis_3,
y = yAxis_3,
z=im_3,
zmin=0,
zmax=3000,
colorscale='viridis',
xaxis='x2',
yaxis='y2',
visible=False,
name = 'id: 4.001')
trace4 = go.Heatmap(x = xAxis_4,
y = yAxis_4,
z=im_4,
zmin=0,
zmax=3000,
colorscale='viridis',
xaxis='x2',
yaxis='y2',
visible=False,
name = 'id: 8.002')
trace5 = go.Heatmap(x = xAxis_5,
y = yAxis_5,
z=im_5,
zmin=0,
zmax=3000,
colorscale='viridis',
xaxis='x2',
yaxis='y2',
visible=False,
name = 'id: 9.001')
trace6 = go.Heatmap(x = xAxis_6,
y = yAxis_6,
z=im_6,
zmin=0,
zmax=3000,
colorscale='viridis',
xaxis='x2',
yaxis='y2',
visible=False,
name = 'id: 1.001')
data=[trace1, trace2, trace3, trace4, trace5, trace6]
updatemenus = list([
dict(active=0,
x = 0.4,
xanchor = 'left',
y = -0.15,
yanchor = 'bottom',
direction = 'up',
font=dict(
family='Times New Roman',
size=16
),
buttons=list([
dict(label = 'Map 1',
method = 'update',
args = [{'visible': [True, False, False, False, False, False]},
]),
dict(label = 'Map 2',
method = 'update',
args = [{'visible': [False, True, False, False, False, False]},
]),
dict(label = 'Map 3',
method = 'update',
args = [{'visible': [False, False, True, False, False, False]},
]),
dict(label = 'Map 4',
method = 'update',
args = [{'visible': [False, False, False, True, False, False]},
]),
dict(label = 'Map 5',
method = 'update',
args = [{'visible': [False, False, False, False, True, False]},
]),
dict(label = 'Map 6',
method = 'update',
args = [{'visible': [False, False, False, False, False, True]},
]),
])
)
])
layout = dict(
width=500,
height=500,
margin = dict(
t=40,
r=50,
b=10,
l=50),
annotations=[
dict(
x=1.25,
y=1.1,
showarrow=False,
text='T<sub>1</sub> (ms)',
font=dict(
family='Times New Roman',
size=26
),
xref='paper',
yref='paper'
),
],
xaxis = dict(range = [0,255], autorange = False,
showgrid = False, zeroline = False, showticklabels = False,
ticks = '', domain=[0, 1]),
yaxis = dict(range = [0,255], autorange = False,
showgrid = False, zeroline = False, showticklabels = False,
ticks = '', domain=[0, 1]),
xaxis2 = dict(range = [0,255], autorange = False,
showgrid = False, zeroline = False, showticklabels = False,
ticks = '', domain=[0, 1]),
yaxis2 = dict(range = [0,255], autorange = False,
showgrid = False, zeroline = False, showticklabels = False,
ticks = '', domain=[0, 1], anchor='x2'),
showlegend = False,
autosize = False,
updatemenus=updatemenus
)
fig = dict(data=data, layout=layout)
#iplot(fig, filename = 'basic-heatmap', config = config)
plot(fig, filename = 'figure2.html', config = config)
display(HTML('figure2.html'))
Phantom results#
An overview of the T1 results for the submitted NIST phantom datasets are displayed in Figure 3. The same data is presented in each column with different axes types (linear, log, and error) to better visualize the results. The left column (a,d) shows the mean T1 with their standard deviations in each of the 14 ROIs are plotted against temperature-corrected reference T1 values using linear axes for a representative dataset (a) and all datasets (d). The middle column (b,e) displays the same mean T1 datasets as (a,d) but using log-log axes. The right column (c,f) displays the error (%) of the measured T1 relative to the temperature-corrected NIST reference values; the dotted lines represent a ±10% error. Figure 3a shows a strong linear trend and slight underestimation (slope = 0.98, intercept = -14 ms) for this dataset in comparison to the reference T1 values. However, this trend breaks down for low T1 values (T1 < 100-200 ms), as seen in the log-log plot (Figure 3b), which was expected because the imaging protocol is optimized for human T1 values (T1 > 500 ms). Errors exceeding 10% are observed for T1 values of phantom spheres below this threshold (Figure 3c). These trends are observed for the entire-dataset plots as well (Figure 3d-f). More variability is seen in Figure 3d around the identity diagonal at very high T1 (T1 ~ 2000 ms) than towards the WM-GM values (T1 ~ 600-1400 ms), which is less apparent in the log-log plot (Figure 3e). In addition to the low T1 values exceeding the 10% error threshold (Figure 3f), a few measurements with outlier values (~3-4) human tissue range were observed in the human tissue range.
Figure 3. Measured mean T1 values vs. temperature-corrected NIST reference values of the phantom spheres presented as linear plots (a,d), log-log plots (b,e), and plots of the error relative to reference T1 value. Plots (a–c) are of an example single dataset, whereas plots (d–f) are of all acquired datasets.
%reset
from os import path
import os
if path.isdir('analysis')== False:
!git clone https://github.com/rrsg2020/analysis.git
dir_name = 'analysis'
analysis = os.listdir(dir_name)
for item in analysis:
if item.endswith(".ipynb"):
os.remove(os.path.join(dir_name, item))
if item.endswith(".md"):
os.remove(os.path.join(dir_name, item))
# Imports
import warnings
warnings.filterwarnings("ignore")
from pathlib import Path
import pandas as pd
import json
import nibabel as nib
import numpy as np
from analysis.src.database import *
from analysis.src.nist import get_reference_NIST_values, get_NIST_ids
from analysis.src.tools import calc_error
from analysis.src.nist import temperature_correction
import matplotlib.pyplot as plt
plt.style.use('analysis/custom_matplotlibrc')
plt.rcParams["figure.figsize"] = (10,10)
fig_id = 0
database_path = Path('analysis/databases/3T_NIST_T1maps_database.pkl')
output_folder = Path("analysis/plots/03_singledataset_scatter_NIST-temperature-corrected/")
estimate_type = 'mean' # median or mean
## Define Functions
def plot_single_scatter(x, y, y_std,
title, x_label, y_label,
file_prefix, folder_path, fig_id,
y_type='linear'):
if y_type is 'linear':
plt.errorbar(x,y, y_std, fmt='o', solid_capstyle='projecting')
ax = plt.gca()
ax.axline((1, 1), slope=1, linestyle='dashed')
ax.set_ylim(ymin=0, ymax=2500)
ax.set_xlim(xmin=0, xmax=2500)
if y_type is 'log':
plt.loglog(x,y,'o')
ax = plt.gca()
ax.set_ylim(ymin=20, ymax=2500)
ax.set_xlim(xmin=20, xmax=2500)
if y_type is 'error_t1':
plt.errorbar(x,calc_error(x,y), fmt='o')
ax = plt.gca()
ax.axline((1, 0), slope=0, color='k')
ax.axline((1, -10), slope=0, linestyle='dashed', color='k')
ax.axline((1, 10), slope=0, linestyle='dashed', color='k')
ax.set_ylim(ymin=-100, ymax=100)
ax.set_xlim(xmin=0, xmax=2500)
plt.title(title)
plt.xlabel(x_label)
plt.ylabel(y_label)
fig = plt.gcf()
folder_path.mkdir(parents=True, exist_ok=True)
if fig_id<10:
filename = "0" + str(fig_id) + "_" + file_prefix
else:
filename = str(fig_id) + "_" + file_prefix
fig.savefig(folder_path / (str(filename) + '.svg'), facecolor='white')
fig.savefig(folder_path / (str(filename) + '.png'), facecolor='white')
fig_id = fig_id + 1
plt.show()
return fig_id
## Load database
df = pd.read_pickle(database_path)
## Initialize array
dataset_estimate = np.array([])
dataset_std = np.array([])
index = 6.001
serial_number = df.loc[index]['phantom serial number']
for key in get_NIST_ids():
if estimate_type == 'mean':
dataset_estimate = np.append(dataset_estimate, np.mean(df.loc[index][key]))
elif estimate_type == 'median':
dataset_estimate = np.append(dataset_estimate, np.median(df.loc[index][key]))
else:
Exception('Unsupported dataset estimate type.')
dataset_std = np.append(dataset_std, np.std(df.loc[index][key]))
ref_values = get_reference_NIST_values(serial_number)
temperature = df.loc[index]['phantom temperature']
temp_corrected_ref_values = temperature_correction(temperature, serial_number)
output_folder = Path("analysis/plots/04_alldatasets_scatter_NIST-temperature-corrected/")
## Initialize array
dataset_mean = np.zeros((1,14))
dataset_std = np.zeros((1,14))
version = np.array([])
temperature = np.array([])
ref_values = np.zeros((1,14))
ii=0
for index, row in df.iterrows():
if type(df.loc[index]['T1 - NIST sphere 1']) is np.ndarray:
version = np.append(version,df.loc[index]['phantom serial number'])
temperature = np.append(temperature, df.loc[index]['phantom temperature'])
if version[ii] is None:
version[ii] = 999 # Missing version, only known case is one where we have version > 42 right now.
if temperature[ii] is None:
temperature[ii] = 20 # Missing temperature, assume it to be 20C (reference temperature).
if ii==0:
ref_values = get_reference_NIST_values(version[ii])
temp_corrected_ref_values = temperature_correction(temperature[ii], version[ii])
else:
ref_values = np.vstack((ref_values, get_reference_NIST_values(version[ii])))
temp_corrected_ref_values = np.vstack((temp_corrected_ref_values, temperature_correction(temperature[ii], version[ii])))
tmp_dataset_estimate = np.array([])
tmp_dataset_std = np.array([])
for key in get_NIST_ids():
if estimate_type is 'mean':
tmp_dataset_estimate = np.append(tmp_dataset_estimate, np.mean(df.loc[index][key]))
elif estimate_type is 'median':
tmp_dataset_estimate = np.append(tmp_dataset_estimate, np.median(df.loc[index][key]))
else:
Exception('Unsupported dataset estimate type.')
tmp_dataset_std = np.append(tmp_dataset_std, np.std(df.loc[index][key]))
if ii==0:
dataset_estimate = tmp_dataset_estimate
dataset_std = tmp_dataset_std
else:
dataset_estimate = np.vstack((dataset_estimate, tmp_dataset_estimate))
dataset_std = np.vstack((dataset_std, tmp_dataset_std))
ii=ii+1
## Setup for plots
fig_id = 0
dims=ref_values.shape
file_prefix = 'alldatasets'
Inter-participant coefficient of variations (COV) were calculated by selecting one single T1 map submitted per challenge participant and calculating the COV of the T1 means per sphere. The average inter-participant COV across the first five spheres representing the expected range in the human brain was 6.1 % (sphere 1 = 4.7 %, sphere 2 = 3.1 %, sphere 3 = 6.3 %, sphere 4 = 12.8 %, sphere 5 = 7.3 %). Two sites were clear outliers that had particular issues for sphere 4, likely due to a combination of an implementation error and a resulting uncertainty of where the signal null lies for his four-TI measurement at that T1 value; by removing these outliers, the mean inter-participant COV reduces to 4.1 % (sphere 1 = 5.4 %, sphere 2 = 3. 5%, sphere 3 = 2.5 %, sphere 4 = 4.2 %, sphere 5 = 4.9 %). One participant measured T1 maps with one phantom using one implemented protocol at 7 different sites using a single manufacturer, and so a mean intra-participant COV across the first five spheres for this case was calculated to be 2.9 % (sphere 1 = 4.9 %, sphere 2 = 3.5 %, sphere 3 = 2.6 %, sphere 4 = 2.0 %, sphere 5 = 1.6 %).
Figure 4. Scatter plot comparing complex and magnitude-only fitted data. The markers are color-coded based on the implementation site, while their size represents the difference (annotated for scale) between two cases of T1 estimations for each sphere (from 1 to 7).
Figure 4 compared the mean T1 values measured using complex and magnitude-only data for the 11 datasets where authors provided both in their submissions. Note that these datasets are from the same acquisition, not two unique acquisitions. The scatter plot shows that for the range of T1 values expected in the brain (T1 > 500 ms), there is almost no difference in fitted T1 values between the two types of data (the highest outlier indicates ~9ms difference). However, for T1 values less than ~250 ms, there are large errors (please see the dashboard), which are likely due to poor fitting using a protocol that is not optimized for this range of T1 values.
Figure 5. Hierarchical shift function analysis comparing T1 estimation error throughout the entire range of voxel-wise distributions split into quantiles at 9 intervals (q1-q9, where q5 corresponds to median difference) in spheres 1-6, for 20 measurements. Each panel displays individual shift functions for each measurement (colored by vendor) in the top row, quantifying (overestimation if above the zero-crossing, underestimation otherwise) and characterizing (e.g., straight lines indicate a homogeneous measurement error across voxels) the percent measurement error. The bottom row in each panel (gray markers) shows the average trend of bootstrapped differences at each decile in milliseconds (e.g., a 39ms median (q5) underestimation trend in Sphere 3). High-density intervals not intersecting the zero crossing indicate a notable common trend at the respective decile.
The direction of the measurement error in the phantom is influenced by both the measurement site and the reference value, as indicated by the individual shift functions (Figure 5). For example, at sphere 1 (~2000 ms), nearly half of the measurements (20 shown in total) are positioned on each side of the zero-crossing. On the other hand, for sphere 3 (~1s), nearly all the measurements show underestimation as shift functions are located below the zero-crossing. Bootstrapped differences capture these trends, indicating a dominant overestimation at sphere 1 (median difference of +17ms) and underestimation at sphere 3 (median difference of -39ms). High-density intervals associated with these median differences do not indicate a common pattern for the former (intervals cross zero), whereas they reveal a notable underestimation trend at sphere 3 (intervals do not include zero). A similar common pattern is also observed for sphere 2 (median overestimation of 35ms). In addition, the shape of individual shift functions conveys information about how voxel-wise distributions differ. For example, curved lines in sphere 2 from two different sites reveal that some of the (ROI selected) voxels show drastically higher underestimation that cannot be captured by comparisons of central tendency alone. Lastly, the spread of shift functions around the zero-crossing does not indicate vendor-specific clustering for the selected measurements and reference values.
Human results#
Figure 6 summarizes the results from human datasets submitted to this challenge, showing mean and standard deviation T1 values from the WM (genu) and GM (cerebral cortex) ROIs. The top plot collapses all datasets for each site, while the bottom plot shows each dataset separately. Mean WM T1 values across all submissions was 828 ± 38 ms in the genu and 854 ± 50 ms in the splenium, and mean GM T1 values were 1548 ± 156 ms in the cortex and 1188 ± 133 ms in the deep GM, with less variations overall in WM compared to GM possibly due to better ROI placement and less partial voluming in WM. Inter-participant coefficients of variation (COV) for independently-implemented imaging protocols were calculated using one T1 map measurement per submission that most closely matched the proposed protocol, and were 6.0% for genu, 11% for splenium, 16% for cortical GM and 22% for deep GM. One site (site 9) measured multiple subjects on three scanners using two different vendors, and so intra-participant COVs for these centrally-implemented protocols were calculated over acquired T1 maps from this site, and were 2.9% for genu, 3.5% for splenium, 6.9 % for cortical GM and 7.8% for deep GM. It’s important that this site also had the best slice positioning, cutting through the AC-PC line and genu for proper ROI placement, particularly for the corpus callosum and deep GM.
Figure 6. Mean T1 values in two sets of ROIs, white matter (one 5x5 voxel ROI, genu) and gray matter (three 3x3 voxel ROIs, cortex). Top figure shows all datasets collapsed into sites, whereas the bottom shows each individual dataset.
%reset
from os import path
import os
if path.isdir('analysis')== False:
!git clone https://github.com/rrsg2020/analysis.git
dir_name = 'analysis'
analysis = os.listdir(dir_name)
for item in analysis:
if item.endswith(".ipynb"):
os.remove(os.path.join(dir_name, item))
if item.endswith(".md"):
os.remove(os.path.join(dir_name, item))
# Imports
import warnings
warnings.filterwarnings("ignore")
from pathlib import Path
import pandas as pd
import nibabel as nib
import numpy as np
from analysis.src.database import *
import matplotlib.pyplot as plt
plt.style.use('analysis/custom_matplotlibrc')
plt.rcParams["figure.figsize"] = (20,5)
fig_id = 0
# Configurations
database_path = Path('analysis/databases/3T_human_T1maps_database.pkl')
output_folder = Path("analysis/plots/08_wholedataset_scatter_Human/")
estimate_type = 'mean' # median or mean
# Define functions
def plot_both_scatter(x1, x2, y, y_std,
title, x1_label, x2_label, y_label,
file_prefix, folder_path, fig_id):
plt.rcParams["figure.figsize"] = (20,10)
fig, axs = plt.subplots(2)
fig.suptitle(title)
axs[0].errorbar(x1, y, y_std, fmt='o', solid_capstyle='projecting')
axs[0].set_xlabel(x1_label)
axs[0].set_ylabel(y_label)
axs[0].set_xticks(np.arange(0, np.max(x1), step=1))
axs[1].errorbar(x2, y, y_std, fmt='o', solid_capstyle='projecting')
axs[1].set_xlabel(x2_label)
axs[1].set_ylabel(y_label)
axs[1].set_xticklabels(labels=x2, rotation=90)
if fig_id<10:
filename = "0" + str(fig_id) + "_" + file_prefix
else:
filename = str(fig_id) + "_" + file_prefix
fig.savefig(folder_path / (str(filename) + '.svg'), facecolor='white')
fig.savefig(folder_path / (str(filename) + '.png'), facecolor='white')
fig_id = fig_id + 1
plt.show()
return fig_id
# Load database
df = pd.read_pickle(database_path)
genu_estimate = np.array([])
genu_std = np.array([])
splenium_estimate = np.array([])
splenium_std = np.array([])
deepgm_estimate = np.array([])
deepgm_std = np.array([])
cgm_estimate = np.array([])
cgm_std = np.array([])
ii = 0
for index, row in df.iterrows():
if estimate_type is 'mean':
genu_estimate = np.append(genu_estimate, np.mean(df.loc[index]['T1 - genu (WM)']))
genu_std = np.append(genu_std, np.std(df.loc[index]['T1 - genu (WM)']))
splenium_estimate = np.append(splenium_estimate, np.mean(df.loc[index]['T1 - splenium (WM)']))
splenium_std = np.append(splenium_std, np.std(df.loc[index]['T1 - splenium (WM)']))
deepgm_estimate = np.append(deepgm_estimate, np.mean(df.loc[index]['T1 - deep GM']))
deepgm_std = np.append(deepgm_std, np.std(df.loc[index]['T1 - deep GM']))
cgm_estimate = np.append(cgm_estimate, np.mean(df.loc[index]['T1 - cortical GM']))
cgm_std = np.append(cgm_std, np.std(df.loc[index]['T1 - cortical GM']))
elif estimate_type is 'median':
genu_estimate = np.append(genu_estimate, np.median(df.loc[index]['T1 - genu (WM)']))
genu_std = np.append(genu_std, np.std(df.loc[index]['T1 - genu (WM)']))
splenium_estimate = np.append(splenium_estimate, np.median(df.loc[index]['T1 - splenium (WM)']))
splenium_std = np.append(splenium_std, np.std(df.loc[index]['T1 - splenium (WM)']))
deepgm_estimate = np.append(deepgm_estimate, np.median(df.loc[index]['T1 - deep GM']))
deepgm_std = np.append(deepgm_std, np.std(df.loc[index]['T1 - deep GM']))
cgm_estimate = np.append(cgm_estimate, np.median(df.loc[index]['T1 - cortical GM']))
cgm_std = np.append(cgm_std, np.std(df.loc[index]['T1 - cortical GM']))
else:
Exception('Unsupported dataset estimate type.')
ii = ii +1
# Store the IDs
indexes_numbers = df.index
indexes_strings = indexes_numbers.map(str)
x1_label='Site #'
x2_label='Site #.Meas #'
y_label="T$_1$ (ms)"
file_prefix = 'WM_and_GM'
folder_path=output_folder
x1=indexes_numbers
x2=indexes_strings
y=genu_estimate
y_std=genu_std